462 research outputs found

    Contextual Out-of-Domain Utterance Handling With Counterfeit Data Augmentation

    Full text link
    Neural dialog models often lack robustness to anomalous user input and produce inappropriate responses which leads to frustrating user experience. Although there are a set of prior approaches to out-of-domain (OOD) utterance detection, they share a few restrictions: they rely on OOD data or multiple sub-domains, and their OOD detection is context-independent which leads to suboptimal performance in a dialog. The goal of this paper is to propose a novel OOD detection method that does not require OOD data by utilizing counterfeit OOD turns in the context of a dialog. For the sake of fostering further research, we also release new dialog datasets which are 3 publicly available dialog corpora augmented with OOD turns in a controllable way. Our method outperforms state-of-the-art dialog models equipped with a conventional OOD detection mechanism by a large margin in the presence of OOD utterances.Comment: ICASSP 201

    Three essays on social policy

    Get PDF
    This dissertation consists of three essays that examine the impact of the Supplemental Nutrition Assistance Program (SNAP) work requirement on program participation using national and state-level administrative and cannabis legalization on behavioral changes of individuals using national data. In chapter 1, I investigate whether the SNAP work requirement affects SNAP participation of able-bodied adults without dependents (ABAWDs) in the State of Missouri. Since 1996, ABAWDs have been required to work or participate in work programs at least 20 hours per week for SNAP eligibility. Using the sample of SNAP household heads and their youngest child from administrative data for Missouri from 2004 to 2010, I compare the hazard of exit, exploiting the fact that the work requirement exemption depends on the age of the youngest child and the county of residence. I find a modest reduction in SNAP participation due to the SNAP work requirement. The cumulative survival estimation indicates that the youngest child and household head are less likely to continue to participate in SNAP for the next 24 months by 4.6 and 7.4 percentage points when they are expected to be subject to the work requirement. In chapter 2, I extend the analysis of the impact of the work requirement on the chance of reoffending. Based on the National Correction Reporting Program (NCRP) data between 2010 and 2017, I employ a linear probability model to measure the effect of the SNAP work requirement on the chance of returning to prison. To ensure the casual relationship between the requirement and recidivism, I use two SNAP policy variations, the work requirement imposition and SNAP lifetime ban policies, which affect offenders differently based upon their release dates, counties of the residence, types of offenses, and age at release. The results show that the additional SNAP work requirement imposition increases the chance of returning to prison within three years by 0.22 and 0.47 percentage points for non-drug and drug offenders, respectively. In the final chapter, I investigate the effect of recreational cannabis legalization on the chance of returning to prison for ex-offenders. To date, 21 states and D.C. have passed legislation permitting the use of cannabis for recreational purpose. However, previous studies yield mixed results regarding the relationship between criminal activities and cannabis legalization. Moreover, there is no previous study about how cannabis legalization affects the chance of recidivism for ex-offenders. Using the NCRP data between 2006 and 2019, I employ a difference-in-difference model using the different timing of cannabis legalization in six states. My results do not imply evidence that cannabis legalization affects the chance of returning to prison within one year after release. The main result holds true for different specifications, including the exclusion of California, altering the definition of neighboring states, or substituting cannabis sales dates for the dates of legalization. Nor are there effects for subgroups based on ex-offenders' characteristics.Includes bibliographical references

    Constrained Policy Optimization for Controlled Self-Learning in Conversational AI Systems

    Full text link
    Recently, self-learning methods based on user satisfaction metrics and contextual bandits have shown promising results to enable consistent improvements in conversational AI systems. However, directly targeting such metrics by off-policy bandit learning objectives often increases the risk of making abrupt policy changes that break the current user experience. In this study, we introduce a scalable framework for supporting fine-grained exploration targets for individual domains via user-defined constraints. For example, we may want to ensure fewer policy deviations in business-critical domains such as shopping, while allocating more exploration budget to domains such as music. Furthermore, we present a novel meta-gradient learning approach that is scalable and practical to address this problem. The proposed method adjusts constraint violation penalty terms adaptively through a meta objective that encourages balanced constraint satisfaction across domains. We conduct extensive experiments using data from a real-world conversational AI on a set of realistic constraint benchmarks. Based on the experimental results, we demonstrate that the proposed approach is capable of achieving the best balance between the policy value and constraint satisfaction rate
    • …
    corecore